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ABSTRACT 

The geographical location of Internet IP addresses has an im- 
portance both for academic research and commercial appli- 
cations. Thus, both commercial and academic databases and 
tools are available for mapping IP addresses to geographic 
locations. 

Evaluating the accuracy of these mapping services is com- 
plex since obtaining diverse large scale ground truth is very 
hard. In this work we evaluate mapping services using an al- 
gorithm that groups IP addresses to PoPs, based on structure 
and delay. This way we are able to group close to 100,000 
IP addresses world wide into groups that are known to share 
a geo-location with high confidence. We provide insight into 
the strength and weaknesses of IP geolocation databases, and 
discuss their accuracy and encountered anomalies. 

1. INTRODUCTION 

Geolocation services have become in the recent years 
a necessity in many fields and for many applications. 
While the end user is usually not aware of it, many 
websites visited by him every day use geolocation in- 
formation. Some of the common uses of geolocation in- 
formation is for targeted localized advertising, localized 
content (such as local news and weather), and compli- 
ance with local law. 

Perhaps the most highlighted purpose of geolocation 
information is for fraud prevention and various means of 
security. Banking, trading, and almost any other type 
of business that handles online money transactions are 
exposed to phishing attempts as well as other schemes. 
Criminals try to break into user accounts to transfer 
money, manipulate stocks, make purchases and more. 
The geolocation information provides means to reduce 
the risk, for example by blocking users from certain 
high-risk countries, cross-referencing user expected and 
actual location and more. Organizations that handle 
national security find geolocation information useful as 



^This work was partially funded by the OneLab II and 
the MOMENT consortia, which are partly financed by the 
European Commission; and by the Israeli Science Founda- 
tion, center of knowledge on communication networks (grant 

#1685/07) 



well, like the DHS cyber security center [37 . Even sim- 
ple emergency services, such as dispatching emergency 
responders to the location of emergency use it. 

Geolocation information is also important in many 
research fields. It improves internet mapping and char- 
acterization, as it ties the internet graph to actual node 
positions, and allows exploring new aspects of the net- 
work that are otherwise uncovered, such as the effect of 
ISP location on its services and types of relationships 
with other service providers. 

Many previous papers have discussed the usage of ge- 
olocation information in day-to-day applications. They 
vary in fields from law [22l |T3l HI] through informa- 
tion security and fraud prevention [271 IH] to various 
economic aspects [23l [25]. However, not many works 
have focused on the accuracy of geolocation databases. 
In 2008, Siwpersad et at. [TO] examined the accuracy 
of Maxmind [29 and IP2Location [T7]- They assessed 
their resolution and confidence area and concluded that 
their resolution is too coarse and that active measure- 
ments provide a more accurate alternative. Gueye et 
at. [15 investigated the imprecision of relying on the lo- 
cation of blocks of IP addresses to locate Internet hosts 
and showed that the geographic area spanned by blocks 
can be far larger than the typical distance between any 
two IPs within a block. Thus it indicated that geolo- 
cation information coming from exhaustive tabulation 
may contain an implicit imprecision. 

The IETF has also commenced in defining standards 
for geolocation and emergency calling. The IETF GEO- 
PRIG working group [19] discusses internet geoloca- 
tion standards and privacy protection for geolocation. 
Some examples are DHCP location, as in RFC3825 and 
RFC4776, and defining protocols for discovering the lo- 
cal location information server [45] . 

Muir and Oorschot [3T1 conducted a survey of ge- 
olocation techniques used by geolocation databases and 
examined means for evasion/circumention from a secu- 
rity standpoint. 

Improving location accuracy by measurements has 
been addressed by several works in the recent years. 
IP2Geo [32^ is one of the first to suggest a measurement- 



based approach to approximate the geographical dis- 
tance of network hosts. A more mature approach is 
constraint based geolocation [16] , which uses several de- 
lay constraints to infer the location of a network host by 
a triangulation-like method. Later works, such as Oc- 
tant [46] use a geometric approach to localize a node 
within 22 mile radii. Katz-Bassett et al. [24] suggested 
topology based geolocation using link delay to improve 
the location of nodes. Yoshida et al. [47] used end-to- 
end communication delay measurements to infer PoP 
level topology between thirteen cities in Japan. Laki et 
al. [34] increased geolocation accuracy by decomposing 
the overall path-wise packet delay to link-wise compo- 
nents and were thus able to approximate the overall 
propagation delay along the measurement path. Eriks- 
son et al. [7| apply a learning based approach to im- 
prove geolocation. They reduce IP geolocation to a 
machine learning classification problem and use Naive 
Bayes framework to increase geolocation accuracy. 

In this paper we study the accuracy of geolocation 
databases. The main problem in such a study is the 
lack of ground truth information, namely a large and 
diverse set of IP addresses with known geographic lo- 
cation to compare the geolocation databases against. 
We avoid this need using a different approach, we use 
an algorithm, whose main features are summarized in 
Section 13.11 for mapping IP addresses to PoPs (Points 
of Presence). The algorithm, which is based both on 
delay measurements and graph structure, has a very 
small probability to map two IP addresses, which are 
not co-located, to the same PoP. Thus, while we do not 
know the location of the PoP we know that all the IP 
addresses within a PoP should reside in the same lo- 
cation. This serves as a mean to check a geolocation 
database coherency: if two IP addresses in the same 
PoP are mapped to different locations the database has 
a problem, and we can use the distances among the 
various locations of IP addresses in the same PoP as 
a measure of database accuracy. The results are pre- 
sented in Sec. 14. 1[ 

We take a step further and compare multiple databases 
results for the same PoP (Sec. 14. 2|) . In case a majority 
of the results in each database are identical we can ex- 
pect that for each database a majority vote will give us 
the correct location of the PoP, and the spread of the 
locations will give us the confidence measure of the re- 
sult. This will help us to identify cases where a database 
reports for a large portion of the IP addresses of some 
ISP the same default location (usually the ISP head- 
quarters). 

2. GEOLOCATION SERVICES 

Geolocation services range from free services, through 
services that cost a few hundreds of dollars and up to 
services that cost tens of thousands of dollar a year. 



This section surveys most of these services, focusing on 
the main players. 

Free geolocation services differ from one another in 
nature. Three representative of such sources are dis- 
cussed here: DNS resolution, Google Gears and HostlP- 
.Info. DNS resolution was probably the first source for 
geolocation information, being free and available to all 
users. In 2002 Spring et al. [41] used DNS names to 
improve location information as part of the Rocketfuel 
project. The UnDNS tool they provided is still used to 
uncover location from DNS name. However, DNS suf- 
fers from several problems: many interfaces do not have 
a DNS name assigned to them, and incorrect locations 
are inferred when interfaces are misnamed [48 . In addi- 
tion, rules for inferring the locations of all DNS names 
do not exist, and require some manual adjustments. As 
part of Google Labs Gears API, Google provides a set of 
geolocation API [12] that allows to query a user's cur- 
rent position. The position is obtained from onboard 
sources, such as GPS, a network location provider, or 
from the user's manual input. When needed, the loca- 
tion API also has the ability to send various signals that 
the devices has access to (nearby cell sites, WIFI nodes, 
etc.) to a third-party location service provider, who re- 
solves the signals into a location estimate |13J. Thus, 
the service granularity is based on a single IP address 
granularity and not on address blocks. HostlP.Info [18] 
is an open source project, with many of its API con- 
tributed by its community. The data is collected from 
users participating in direct feedback through the API, 
as well as ISP's feedback. In addition, website visitors 
are updating their location, which in turn is updated 
as a database entry. The city data comes from various 
sources, such as data donation and US census data (for 
the USA). The data is provide as /24 CIDR blocks. 

Another type of geolocation services emerges from 
universities and research institutes. These services tend 
to use measurements, entirely or on top of other method- 
ologies, in order to improve geolocation data quality. 
While many of the measurement based geolocation ser- 
vices that we discussed in Section [1] do not provide the 
ability to query specific IP addresses [24l |46l |47] , one 
online geolocation service that does allow it is Spot- 
ter [35 , which is based on a work by Laki et al. [34] . 
Spotter uses a detailed path-latency model to determine 
the overall propagation delays along the network paths 
more accurately, which in turn translates to more ac- 
curate geographic distance estimation. The evaluation 
process also takes into account the discovered topol- 
ogy between the measurement points, and end-to-end 
latency measurements as well. One-way delay measure- 
ments further increase the accuracy of router geoloca- 
tion techniques. 

Mid-range cost geolocation services include databases 
such as Maxmind GeoIP, IPligence, and IP2Location. 



All these databases cost a few hundreds of US Dollars 
and provide the user a full database, typically as a flat 
file or MySQL dump. Some of the companies, such as 
MaxMind, also provide a geolocation web service. 

MaxMind ^29| is one of the pioneers in geolocation, 
founded in 2002, and it provides a range of databases: 
from country level to city level, longitude and latitude. 
Information on ISP and netspeed can be retrieved as 
well. In addition to all the above, MaxMind suggests to 
enterprises a database with an accuracy radius for its 
geolocation information. In this work, the MaxMind 
GeoIP City database is being used for geolocation in- 
formation. IPInfoDB [1 is a free geolocation service 
that uses MaxMind GeoIP lite database and adds on 
top of it reserved addresses and optional timezone. 

IPligence [21 is a geolocation service provider, ex- 
isting since 2006. Its high end product, IPligence Max, 
provides geographic information such as country, region 
and city, longitude and latitude, in addition to gen- 
eral information such as owner and timezone. Hexasoft 
development maintains IP2Location [17 , a gelocation 
database with a wider range of geolocation information: 
from IP to country conversion, to retrieving informa- 
tion such as bandwidth and weather. For this study, 
we used their DB5 database, which maps IP addresses 
to country, region, city, latitude, and longitude. In all 
the above products, the IP addresses' location is given 
in ranges, which vary in size and reach the granularity 
of a handful of addresses per range. 

High end geolocation services are often priced by the 
number of queries and their cost may reach tens of thou- 
sands of dollars a year for large websites. Amongst 
these services, and based on their pricing level, are 
Quova, Akamai Edge Platform, Digital Element's Ne- 
tacuity Edge and Goebytes. Each of these companies 
praise themselves with large tier-1 customers from dif- 
ferent fields, who use their services for target advertis- 
ing, fraud prevention, and more. 

Quova [2 , founded in 1999, provides three levels of 
data information, bronze, silver, and gold. The ad- 
vanced services contain attributes such as location con- 
fidence level. Designated Market Area (DMA), and sta- 
tus designations for anonymized Internet connections. 
Quova's database is based on random forest classifier 
rules, synthesis rules, approved location labels, hand- 
labeled hostnames, and research notes, with 6 patents 
issued and 9 patents pending. 

Akamai |3 was founded in 1998 and lunched its com- 
mercial service in 1999, provides through its Edge Plat- 
form product IP location information. Its IP location 
services are a part of a much larger package of tools and 
appHcations used for traffic management, dynamic sites 
accelerations, performance enhancement and more. 

Digital Element [6 , founded in 2005, provides under 
the products NetAcuity and NetAcuity Edge two levels 



Database 


Country Level 


City Level 


USA City Level 


IP2Location 


99% 


80% 




MaxMind 
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Table 1: Geolocation Database Accuracy 

of geolocation information, with over thirty nine data 
points, including demographics, postal code, and busi- 
ness type. The IP geolocation data source is anonymous 
data gathered from interactions with users. One source 
for this user information is partner companies that use 
the product. The information is validated using a pro- 
prietary clustering analysis algorithm. The data collec- 
tion and analysis are protected by more than 20 issued 
and pending patents. 

Geobytes [9 launched in 2002 its GeoSelect product, 
which provides geolocation information. The data pro- 
vided by Geobytes matches mid-range companies in its 
wealth, but it is part of a broader package of services, 
including reports, users redirection, etc. While in the 
past Geobytes used ICMP packets to create an infras- 
tructure map, current methods include also gathering 
information from websites that require users to enter 
their location information and then processing this data 
onto Geobytes' infrastructure map of the Internet [30j. 
No DNS information is used by Geobytes for their lo- 
cation resolution. 

In this work, databases from all three groups are be- 
ing used. From the no-charge databases: HostlP.info, 
Spotter and DNS (partial). Mid-range databases used 
are MaxMind GeoIP City, IPligence Max, and IP2- 
Location DBS. GeoBytes and NetAcuity are the last 
two databases used in this work. Unfortunately, we 
failed to reach a collaboration with Quova and Akamai 
for this project. 

2.1 Databases Accuracy 

The geolocation service provider is, in many cases, 
the sole source for database accuracy information. Some 
vendors do not publish accuracy figures at all, such as 
IPligence, while others provide accuracy figures without 
explaining how they were obtained. A few geolocation 
services, such as Akamai and Quova provide accuracy 
figures obtained by external auditors. Table [1] provides 
a summary of accuracy figures, as given by the geolo- 
cation service providers on their websites [2] [3l |6l [H [29l 
[TT ]. The table includes information on country level 
accuracy, city level accuracy world wide and city level 
accuracy in the USA. 

All the databases claim to have 97% accuracy or more 
at the country level and 80% or more at the city level. 

'State level accuracy 



MaxMind provides detailed expected accuracy on city 
level based on country [28^. The accuracy ranges from 
40%— 44% in countries like Nigeria and Tunisia to 94% — 
95% in countries like Georgia, Qatar and Singapore. 
An accurate resolution here is considered one within 25 
miles from its true location. Netacuity accuracy figures 
are based on a test by Keynote Systems, which resulted 
in an exact match for every IP address at the country 
level, and state level for those IP addresses located in 
the United States. At the city level Net Acuity delivered 
97% accuracy. Quova's accuracy figures are based on 
an audit by Pricewater house Coopers [26], which used 
3 reference third party databases. Here 99.9% accuracy 
was achieved at the country level and 97.2% to 98.2% 
were achieved at the state level. 

The accuracy of the figures in Table [1] cannot be 
easily evaluated. For example, neither the means by 
which Keynote Systems tested Netacuity nor the refer- 
ence databases used to test Quova are revealed. Akamai 
claims for 97.2% accuracy at the city level worldwide 
and 100% accuracy at the city level in the USA. The 
source for this is a report by Gomez [20 , which defined 
a node location to be unique on /23 CIDR subnets. In 
addition, a Census Metropolitan Area (CMA) is the ba- 
sis of the naming convention used by Gomez to identify 
the physical location of its measurement nodes. The ac- 
curacy of this method is thus debateable, as described 
in Sec. [H 

Assesing the accuracy of geolocation databases is there- 
fore a hard question, since a large scale ground truth 
does not exist (or is hard to obtain). In this work a 
structural approach is taken to evaluate the accuracy 
of databases, gaining greater knowledge on each IP ad- 
dress by locating it as part of a PoP-level map. 

3. THE EVALUATION MODEL 

3.1 Building PoP Maps 

We define a PoP as a group of routers which belong to 
a single AS and are physically located at the same build- 
ing or campus. In most cases [36, 14 the PoP consists 
of two or more backbone/core routers and a number 
of client/access routers. The client/access routers are 
connected redundantly to more than one core router, 
while core routers are connected to the core network of 
the ISP. The algorithm we use for PoP extraction was 
first suggested by Feldman and Shavitt in [8 and later 
improved in [39 . The algorithm looks for bi-partite 
subgraphs with certain weight constraints in the IP in- 
terface graph of an AS; no aliasing to routers is needed. 
The bi-partites serve as cores of the PoPs and are ex- 
tended with other close by interfaces. 

The initial partitioning removes all edges with delay 
higher than PDmax.th^ PoP maximal diameter thresh- 
old, and edges with number of measurements below 
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is introduced in order to consider only links with a 
high reliable delay estimation to avoid false indication 
of PoPs. The result non-connected graph G^ contains 
induced sub graphs, each is a candidate to become one 
or more PoPs. There are two reasons for a connected 
group to include more than a single PoP. The first and 
most obvious reason is geographically adjacent PoPs, 
e.g.. New York, NY and Newark, NJ. The other is 
caused by wrong delay estimation of a small amount 
of links. For instance a single incorrectly estimated link 
between Los Angeles, CA and Dallas, TX might unify the 
groups obtained by such a naive method. 

Next, the algorithm checks if each connected group 
can be partitioned to more than one PoP, using parent- 
child classification according to the measurement di- 
rection in the bipartite graph. Further localization is 
achieved by dividing the parents and children groups 
into physical collocations using the high connectivity of 
the bipartite graph. If parent pair and child pair groups 
are connected, then the weighted distance between the 
groups is calculated (If they are connected, by defini- 
tion more than one edge connects the two groups); if 
it is smaller than a certain threshold the pair of groups 
is declared as part of the same PoP. Last, a unifica- 
tion of loosly connected parts of the PoP is conducted. 
For this end, the algorithm looks for connected com- 
ponents (PoP candidates) that are connected by links 
whose median distance is very short (below PDmax.th)- 

In the original algorithm [8 , an additional step was 
implemented, called Singleton Treatment, in which nodes 
with only one or two links are assigned to PoPs based 
on their median distance. This step may add to the PoP 
IP addresses that are not necessarily part of it. Thus, in 
this work, two PoP level maps were generated: one map 
without any singletons, which is considered to be accu- 
rate looking at the PoP IP addresses only, and a second 
map that includes singletons. The aim of the second 
map is to improve location estimation where PoP loca- 
tion is undetermined based on the first map only. As 
the singletons are necessarily in the vicinity of the PoP, 
using them does not harm the locations estimation. 

In a previous work [39 , the stability and correctness 
of the PoP extraction algorithm were discussed, as well 
as the effect of threshold settings. For this paper's pur- 
poses, the thresholds sensitivity should be mentioned, 
as they may affect the geolocation accuracy. Figure [T] 
explores the PoP extraction algorithm's sensitivity to 
PDmax.th' In the figure five ISPs are explored: Level 
3, AT&T, Comcast, MCI, and Deutsche Telekom. The 
figure presents the number of IPs included in PoPs when 
changing PDmax.th- Neither the number of discovered 
PoPs nor the number of IPs within the PoPs are sensi- 
tive to the delay threshold, as long as the threshold is 
SraSec or above. PDmax.th was selected to be bmSec^ 
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Figure 1: Number of IPs in PoPs vs. Maximal 
Delay 



as it presents a good tradeoff between delay measure- 
ment's error and location accuracy. The number of 
IPs included in PoPs decreases as the minimal num- 
ber of required measurements, PMmin.thi increases, as 
can be expected (see [39]). In our extracted PoP maps, 
PMjnin.th was sclcctcd to be 5. 

3.2 Data Evaluation Method 

The geolocation databases evaluation is conducted 
using the classification of IP addresses into PoPs as 
described above. Since the classification is based on 
both structure and delay measurements, the chances 
that two IP addresses, which our algorithm map to the 
same PoP, are not located in the same geographical lo- 
cation are slim. We do recognize that when two PoPs 
are very close (within a few tens of kilometers) our algo- 
rithm may unify them to one. However, in this case the 
median of their location is half their distance, namely 
not far. 

To identify the geographical location of a PoP, we use 
the geographic location of each of the IPs included in it. 
As all the PoP IP addresses should be located within 
the same campus, or within its vicinity if singletons are 
considered, the location confidence of a PoP is signif- 
icantly higher than the confidence that can be gained 
from locating each of its IP addresses separately. The 
algorithm, introduced in [39 , operates as follows: 

Initial Location Each of the evaluated geolocation 
databases is queried for the location (longitude, lati- 
tude) of each IP included in the PoP. Next, the center 
weight of the PoP location is found by calculating the 
median of all PoP's IP locations. Unlike average cal- 
culation, where a single wrong IP can significantly de- 
fiect a location, median provides a better suited start- 
ing point. Median is certainly not a guarantee for good 



results. If there is complete disagreement between ge- 
olocation databases as for the location of a PoP, e.g., if 
one of them places all the PoP IPs in London, and the 
other in New- York, the median may be far away from 
any of the suggested locations. However, since geolo- 
cation databases are typically reliable in country-level 
assignment, such an example is highly unlikely. We con- 
sider this assumption later in section ID 

Location Error Range Every PoP location is as- 
signed a range of convergence, representing the expected 
location error range based on the information received 
from the geolocation databases. As the PoP location 
is given as [latitude, longitude], in units of degrees, so 
does the range of convergence. This stage is done itera- 
tively, looking for a majority vote for the PoP location. 
For every IP address in a PoP and for every geolocation 
database we collect the geographic coordinates, thus if 
there are N IP addresses and M databases, and for 
each of the IP addresses, all the databases suggest a 
location, then A^ x M IP address elements are being 
considered for the vote. The algorithm starts at the me- 
dian location, and checks if there is a majority vote for 
the PoP location within a radius 0.01 degrees (one lati- 
tude/longitude degree is roughly equivalent to 111km). 
If the circle includes less than 50% of the located IP 
elements, we continue and increase the radius of the 
circle, by 0.01 degrees each step, until the PoP location 
has a majority vote. Alternatively, the algorithm stops 
when the circle radius reaches a predefined threshold, 
typically 1 or 5 degrees, which we define as the maxi- 
mal range of error. If one of the geolocation databases 
lacks information on an IP address, this IP element is 
not counted in the majority vote. With a majority vote 
we ensure most of the geolocation databases agree on 
the PoP location. 

Location Refinement After a range of convergence 
is found, the PoP location accuracy is further improved. 
A new median location of the PoP is calculated, based 
only on IP elements that are located within the range 
of convergence. This ensures that deviations in the PoP 
location caused by a small number of IP elements out- 
side the range of convergence are discarded, and the 
PoP is centered based only on credible IP addresses. 

The result of the PoP geolocation algorithm includes 
per PoP the following new parameters: longitude, lat- 
itude, range of convergence, the percentage of IP ad- 
dresses within convergence range out of all IP addresses, 
and the percentage of IP addresses within the cover- 
gence range considering only IP elements with location 
information. 

To validate the PoP geolocation generated maps cor- 
rectness, results were compared against PoP maps pub- 
lished by the ISPs, such as Sprint [42], Qwest [33] . 
Global Crossing [10], British Telecom [5], AT&T and 
others. In addition, we reported [39] a limited small 





Figure 2: Map Of DIMES Agents, March-2010 Figure 3: Map Of Discovered PoPs, March-2010 



scale testing of the geolocation accuracy based on 50 
known university locations. The test was based only on 
three databases: Maxmind, IPligence and HostlP.info. 
For 49 out of 50 universities, the location was accurate 
within a 10 kilometer radius. The last PoP, belong- 
ing to The University of Pisa, was located by the algo- 
rithm in Rome, due to an inaccuracy in the MaxMind 
and Ipligence databases. Only Hostip.info provided the 
right coordinates for this PoP. Each PoP location was 
also validated against its DNS name, whenever a DNS 
name was assigned to the interface. 

3.3 Dataset 

The collected dataset for PoP level maps is taken 
from DIMES [38 . We use all traceroute measurements 
taken during March 2010, totaling 126.7 million, namely 
an average of 4.2M million measurements a day. The 
measurements were collected from over 1750 vantage 
points, which are located in 74 countries around the 
world, as shown in Figure O About 16% of the vantage 
points are mobile. 

The 126.7 million measurements produced 7.85 mil- 
lion distinct IP level edges (no IP level aliasing was per- 
formed). Out of these, 1.3 million edges were measured 
five times or more, thus above PMjnin.th^ and 642i^ 
edges had less than PDmax.th median delay, and were 
therefore considered by the PoP extraction algorithm. 
As described above, two PoP level maps were gener- 
ated by the PoP extraction algorithm, with and with- 
out singletons addition. A total of 3800 PoPs where 
discovered, containing 52i^ IP addresses from the first 
run, and 104i^ IP addresses from the second run, mean- 
ing with singletons. Although the number of discovered 
PoPs is not large, as the algorithm currently tends to 
discover mainly large PoPs while missing many access 
PoPs, the large number of IP addresses and the spread 
around the world (see below) allow a large scale and 
meaningful geolocation databases evaluation. 

Figure [3] shows the geographical location (as calcu- 
lated by our algorithm) of the PoPs discovered by the 
PoP algorithm. The PoPs are spread all over the world, 
in all five continents, with high density of PoPs in Eu- 



rope and North America. As can be seen, PoPs are 
located even in places such as Madagascar and Papua 
New Guinea, which comes to show the vast range of lo- 
cation information required from the geolocation databases 
in this evaluation. 

The following databases were studied in this work: 
MaxMind GeoIP, IPligence Max, IP2Location DB5, Ho- 
stlP.info, GeoBytes, NetAcuity, DNS and Spotter. For 
most of the databases, the data which was used, was 
updated on the first week of April 2010. NetAcuity 
database was obtained on the third week of April and 
Spotter located the IP addresses during April and the 
beginning of May 2010. 

4. RESULTS 

4.1 Basic Tests 

4.1.1 Null Replies 

The first question asked for each database is "How 
many NULL replies are returned for IP address queries?" . 
There are four flavors for this question. First it is asked 
only on IPs which are in the core of the PoPs and 
then it is asked for all IP addresses, including single- 
tons addresses. As some databases may have better 
information on end users or access interfaces than on 
core routers and main PoPs, this can be meaningful. 
The next observation regards NULL replies that apply 
to all the IP addresses within a certain PoP: does the 
database fail to cover a range of addresses or a phys- 
ical location range, or are the NULL replies a matter 
of a single IP address lack of information? This too is 
considered both with and without singletons. Table [2] 
shows for each of the databases the percentage of IP ad- 
dresses which returned a NULL reply for each of these 
questions. 

NetAcuity and IP2Location where the only databases 
to return a reply for all the queried IP addresses. For 
IP2Location database there are a few hundreds of NULL 
entries in the entire database (for IP addresses not in 
this study). This alone does not come to indicate that 
the returned addresses are correct, only that an entry 
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Table 2: Null IP Address Information 

exists. The location correctness is discussed later on in 
this section. On the other end of the scale, HostlP.info 
failed to locate most of the IP addresses, however on 
the PoP level this percentage drops by half. It can be 
assumed that HostlP.info nature of the failure is lack of 
information on specific IP addresses and not IP ranges. 
Further more, in most cases HostlP.info does return a 
reply with country information, but without longitude 
and latitude. Spotter did not locate about a third of 
the IP addresses. The reason for such a failure can 
be either that the IP did not respond to ping or the 
IP responded to ping, but the roundtrip-times were too 
high to provide approximations for the algorithm. Only 
core PoP IP addresses, without singletons, where tested 
here. For MaxMind, the percentage of Null replies refers 
to events where no specific location information was 
available. In most of these cases, MaxMind does re- 
turn longitude and latitude information, which are the 
center of the country where the IP is located. A list of 
these coordinates is available to the users, and though 
we choose in this work to refer to this information as 
a NULL reply, a general notion of location is provided 
by the database. DNS NULL replies are less than 15% 
for core PoP IP adresses, and almost 29% when taking 
into account singletons. As there is a probability that 
singletons represent end users and not router interfaces, 
this is expected. The effect of grouping to PoPs when 
looking at DNS is significant: when taking into account 
singletons, only 2% of the PoPs have no location by 
DNS. 

4.1.2 Agreement within database 

By nature, IP addresses belonging to the same PoP 
reside in the same area. One can leverage this informa- 
tion to evaluate the accuracy of a geolocation database: 
if IP addresses that belong to the same PoP are assigned 
different geographical location, then the accuracy of this 
information should be questioned. This statement is 
based on the assumption that the PoP algorithm is cor- 
rect and does not assign IP addresses from different 
locations to the same PoP. We already discussed why it 
is true based on design and previous limited evaluation. 
Our experiments here further support the assumption: 
in all the PoPs evaluated, with no exception, there are 




Figure 4: Range of Convergence Within 
Databases 



always databases that support the PoP vicinity require- 
ment. 

Figure |4] presents a CDF of the convergence range 
within databases without singletons. The X-axis is the 
range of convergence in kilometers, logarithmic scale, 
with 500km being the limit where the algorithm was 
stopped. The algorithm progressed its testing in steps 
of 1km. 

IPligence and IP2Location clearly have a range of 
convergence far better than other databases: over 90% 
of the PoPs located using these databases have the min- 
imal range of convergence - one kilometer, which is in 
practice the exact same location. MaxMind, GeoBytes 
and NetAcuity have 74% to 82% of their PoPs converge 
within one kilometer. For HostlP.info, a bit less than 
57% of the PoPs converge within the minimal range, 
and almost all the rest fail to converge. This is caused 
mostly due to lack of information on IP addresses, as 
many PoPs do not have even a single IP with loca- 
tion information inside a PoP. The case of Spotter here 
is different. As this information is acquired by mea- 
surements, having almost a third of the PoPs converge 
within one kilometer is an indication of good perfor- 
mance. In addition, over 82% of the PoPs converge 
within 100/c?7i, and close to 98% within 500/c?7i, which is 
similar or better than most of the other databases. The 
slow accumulation is expected due to measurements er- 
rors. Maybe the most important graph here is the All 
graph, showing the range of convergence when com- 
bining the information from all databases. Though all 
databases, have most of their PoPs located within the 
minimal range, less than 30% of the All PoPs converge 
within this range, meaning that between the databases 
there is disagreement, though as the range grows so 
does the percentage of converged PoPs. This does not 
necessarily mean that all the databases have agreed on 
the same location, as databases which reply with a lo- 
cation for every IP have more infiuence that databases 
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Figure 5: CDF of Agreement Within Databases, 
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Figure 7: CDF of database location deviation 
from PoP majority - 500km range 
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Figure 8: Breakdown of deviation from PoP ma- 
jority CDF By region - 500km Range 



with some NULL replies. We further explore this ques- 
tion in section 14.21 An important observation is that 
even if a certain database indicates that the range of 
convergence of a PoP is minimal, i.e., 1km, it does not 
necessarily imply accuracy, or in our case that all other 
databases will agree with this location. 

Figures [5] and [6] present a CDF of the agreement 
within databases without singletons. The X axis marks 
the percentage of IP addresses in PoPs that represent 
the majority, and the Y axis presents the probability 
for this majority vote. For Figure [5] we set a radius of 
100km and in Figure [6] the used radius is 500km, within 
which a majority is required. In some cases no major- 
ity is found, i.e., less than 50% of the IP addresses are 
within any circle with the given radius. Remember that 
the algorithm selects in such a case the location based 
on the largest group of votes. 

Note that for all databases there are PoPs that had no 
majority vote, meaning the locations diverged by more 
than 100km or 500km. IPLigence and IP2Location have 
the highest probability to reach an agreement within a 



PoP, while HostlP.Info, and Geobytes grow at the slow- 
est pace. For a radius of 100/cm, Spotter does not reach 
full agreement for almost 60% of the PoPs, probably due 
to measurement accuracy limitations. Interestingly, for 
less than 4% of the PoPs there is 100% agreement by 
all databases, which once again does not correlate with 
single-database observations and points to a mismatch 
between databases. 

4.2 Comparison Between Databases 

4.2.1 Accuracy 

So far, we have discussed results that depend only 
on the database itself. Next we compare the databases 
based on the data collected from all databases. First, 
we asses the accuracy of a database by comparing an 
IP location in every database to the location of its PoP 
as voted by all databases. 

Figure depicts for each database the CDF of the de- 
viation of each IP from the PoP majority vote. The in- 
teresting observations here consider A{)km range, which 
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Figure 9: CDF of Database location deviation 
from PoP majority 



is a city range, and 500km range, which can be referred 
to as a region. IPhgence, MaxMind and IP2Location 
have a probabihty of 62% to 73% to place a IP within 
AOkm from the PoP majority vote, with IPhgence and 
MaxMind placing over 80% of the IP addresses within 
500km radius. Geobytes, HostlP.Info and Netacuity 
place 33% to 47% of the IP addresses within a city 
range, and 48% to almost 60% within 500km. from the 
majority. Spotter places only 10% within AOkm. range 
and 30% within the same region. 

Some of the databases, like HostlP.Info, Netacuity, 
Geobytes and Spotter, deviate less in Europe than in 
the USA and the rest of the world, as depicted in Fig- 
ure El Other databases, as IP2Location, have greater 
deviation in Europe than the rest of the world. For 
clarity, only two of the databases are shown in Figure 
m A drawback of all databases is that there is a long 
tail of IP addresses locations which are placed 5000km. 
or more from the majority of the vote. Figure [9] shows 
that in some databases this tail can hold 15% of the IP 
addresses. Although the majority vote may be incor- 
rect, this points that at least one of the databases is 
very far off from the real IP address location. 

Figure [10] depicts for each database a scatter plot of 
the range of convergence (X axis) versus the deviation 
of the IP location from its PoP location based on all 
databases (Y axis). The figure demonstrates that in 
many cases the range of convergence is small , yet the 
deviation from the PoP majority vote may be thou- 
sands of kilometers. Further more, a large range of 
convergence does not imply that that the PoP center 
is necessarily wrong, as again in all databases we see 
cases where the range is large, yet the selected IP ad- 
dress location is the same as the majority location from 
all databases. IPhgence and IP2Location demonstrate 
an interesting phenomenon: though their range of con- 
vergence is very low, the variation from the PoP ma- 
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Figure 10: Database location deviation from 
PoP majority vs. Range of Convergance 



jority location is very large. This can indicate, as is 
demonstrated next, that large groups of IP addresses 
are assigned a single false location. 

For MaxMind and HostIP there are many PoPs at 
the far end of the graph, with a large range of conver- 
gence. This is caused by lack of information on spe- 
cific IP addresses which does not allow them to reach 
a majority vote. Netacuity and Spotter demonstrate a 
scattered behavior, meaning the range of convergence 
and the deviation from the PoPs majority both change. 
For Netacuity this means that IP addresses are assigned 
distinct locations within the same area, as with different 
users in the same city. Spotter suffers from large range 
of convergence for some PoPs due to NULL replies, how- 
ever there is an obvious trend that places most PoPs IP 
addresses within 300km. range from each other, with a 
small number scattered at larger range of convergance, 
as can be expected in a triangulation based method. 

4.2.2 Correlation Between Databases 

While some of the databases have proprietary means 
to gather location information, a large portion of geolo- 
cation databases is likely to come from the same source, 
such as getting country information from ARIN. To ex- 
amine this theory we calculate the cross correlation be- 
tween every pair of databases, on the entire IP address 
location vector, and display it as a heatmap, shown in 
Figure [Til 

The strongest correlation is between IPhgence and 
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Figure 11: Cross Correlation Between 

Databases - Heatmap 

IP2Location. As was shown in previous results, the 
trends of these two databases look very similar. The 
correlation between these two databases is over 0.99. 
Maxmind and HostlP.Info also have very high correla- 
tion with IP2Location and IPIigence as well as between 
themselves. The correlation figures above do not take 
into account NULL replies. Considering those, IPli- 
gence's and IP2Location's correlation with Maxmind 
drops to 0.8 and with HostlP.Info below 0.6. Compar- 
ing all the databases. Net acuity and Geobytes correlate 
the least: 0.89. Spotter has over 0.94 correlation with 
most databases, expect Geobytes, with 0.97 correlation 
to Netacuity. Considering that the location given by 
Spotter is never a landmark, rather a result of delay 
measurement, this is a high figure. The high correlation 
between the databases indicates that in most cases the 
location addresses returned by all the databases will be 
very much alike. In cases where it is difficult to obtain 
the location address, the answers may vary significantly 
between services. 

4.3 Database Anomalies 

Though the results above may indicate that some 
databases have superb location information, this is not 
the case. In many cases the returned data is deceiv- 
ing, and actually may represent lack of information in 
the database. For example, we identified 266 IP ad- 
dresses in the PoPs that belong to Qwest Communi- 
cations. Out of those, 253 IP addresses are located by 
IPIigence in Denver, Colorado. Looking at the raw IPIi- 
gence database, there are 20291 entries that belong to 
Qwest communications. Out of those, 20252 are located 
in Denver, which is the location of Qwest's headquar- 
ters. The phenomenon was first detected by our algo- 
rithm last year, in July/2009: 70 Qwest PoPs where 
detected. Maxmind assigned them to 55 different lo- 
cations, HostlP.Info to 46 locations, IP21ocations to 35 



locations and IPIigence located them all in Denver. In 
response to a query back then, IPIigence have replied 
that "In some occasions you could find records belong- 
ing to RIPE or any other registrar, these are most likely 
not used IP addresses but registered under their name, 
anything else should be empty or null" . 

Quite a similar case exists with IP2Location. For 
Cogent, 2365 out of 2879 IP addresses were located in 
Washington DC, which is Cogent 's headquarters loca- 
tion. Out of 57 PoPs belonging to Cogent, only one was 
not placed by IP2Location in these exact same coordi- 
nates. For IPIigence, all the PoPs were located in the 
same place, too. However, Maxmind placed the PoPs 
in 13 locations, Geobytes in 23 locations and Netacuity 
in 31 locations (only a handful in Washington's area). 
In the Akamai audit by Gomez [20] a similar case is 
described: A node in Vancouver, Canada was reported 
to be in Tornto, and a node in Bangalore, India was re- 
ported to be in Mumbai. In both cases those were ISP 
headquarters known locations. 

Sometimes differences between databases may be very 
acute, with a reported node location being far off by 
thousands of kilometers and even countries far apart. In 
Figure [T2]one such example is shown. We take a 4- nodes 
PoP in ASN 703 (Verizon/ UUNET / MCI Communi- 
cations) and display on a map the location of the PoP 
based on each of the geolocation database. IPIigence, 
IP2Location, Geobytes, Netacuity and DNS all inter- 
nally have the PoP four IP addresses at the same loca- 
tion, however each of the databases locate it differently: 
IPIigence and IP2Location in Australia, Netacuity and 
DNS in Singapore and Geobytes in Afghanistan. Max- 
Mind and Spotter lack information on these nodes and 
HostlP.Info places the PoP with 66% certainty in China. 
Extending our PoP view to include singletons, thus in- 
cluding 10 nodes, the picture does not change. Max- 
Mind and Spotter have location on one of the IPs and 
they place it in Singapore. IPIigence and IP21ocation 
place 9 out of 10 IPs in Australia, and one in Singa- 
pore. Geobytes places this last IP address in Singapore 
too, yet 6 out of 10 IP locations still point to Kabul. 
The rest three nodes are located in Australia. Geobytes 
does give low certainty rate to the location, being 50 or 
less to both country and region. Netacuity places 8 out 
of 10 IPs in Singpore and 2 in Australia. HostlP.Info 
has location information on 6 IPs, 3 of them are placed 
in China and 3 in Australia, but in Melbourne, far from 
IPIigence and IP21ocation designated location. Notably, 
all the edges in this PoP have less than 3.5mS delay and 
are measured five to 173 times each. 

The mismatch between databases is not uncommon. 
Some examples exist inside the United States, too: in 
Figure [13] we show one PoP in ASN 3549, Global Cross- 
ing, as it is placed by the different geolocation databases 
all across the country. This PoP has over 160 IP ad- 
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dresses, counting singletons, and as such a majority in 
each database has more substance. IPhgence places the 
PoP with more than 90% majority in Springfield, Mis- 
souri. MaxMind and IP2Location point to Saint Louis, 
Missouri with 92% and 82% accordingly. NetAcuity 
indicates that the PoP is in San- Jose, California with 
100% certainty, while DNS and Spotter place the PoP 
in this vicinity, in a radius of a few tens of kilome- 
ters. GeoBytes has somewhat above 59% of the loca- 
tions pointing to New York, with other common answers 
being spread across California (25%). Geobytes country 
certainty here was 100% with 42% region certainty for 
the IP addresses it located in New York. HostlP.Info 
placed the PoP in Chicago with 65% majority (28% of 
the locations had pointed to Santa Clara, California). 

The above are not single incidents. Similar cases have 
been found in other AS as well, such as REACH (AS 
4637), where IPIigence, IP21ocation and Maxmind lo- 
cated a PoP in China, Geobytes located it in Australia, 
while Netacuity and Spotter put it in the silicon valley, 
USA. Other cases range from AS16735 (CTBC/Algar 
Telecom) where PoP locations in Brazil were set thou- 
sands of kilometers apart, to Savvis (AS3561) which is 
another case of locations spread across the USA. 



4.4 Database Changes 

One of the motivations to update geolocation databases 
is the claim that they change significantly over time. 
Maxmind [29] claim that it looses accuracy at a rate of 
approximately 1.5% per month. IP2Location [17 state 
that on average, there are 5%-10% of the records being 
updated in the databases every month due to IP address 
range relocation and new range available. Based on the 
PoPs dataset, we compare this information versus the 
databases at our disposal. For IPIigence, an average 
of approximately one percent of the addresses changes 
every month, with some minimal changes in some con- 
secutive months, such as 0.6% between November and 
December 2009. In HostlP.Info, 18% of the IP addresses 
changed their location within nine months, meaning 
an average of 2% a month. IP2Location changed only 
1% of the locations over 4 months, meaning 0.25% per 
month, however the reference set here included only 
lOK IP addresses. For Netacuity, running only on our 
dataset of 104K IP addresses, we observe that 2.4% of 
the IP addresses have changed in less than a month. 

5. DISCUSSION 

Interpretation of the results in Section 14.21 should be 
done with care. Placing a PoP at the majority center 
of gravity may not always yield the true results, e.g., in 
cases where a single wrong information source is used 
by multiple databases. 

IPIigence and IP2Location share several similar char- 
acteristics as well as strong cross correlation. This is 
exhibited by the high probability for a small range of 
convergence and by the fast rate their probability grows 
to reach a high level of agreement. However, the vari- 
ous anomalies found in their databases shed a different 
light on these results. For example, if for a certain ISP 
all the IP addresses are assigned to a single location, 
then the immediate effect will be a small range of con- 
vergence and high level of agreement. Further more, 
lack of NULL replies in this case may be misleading 
as the returned reply may be false. As in the cases 
of MCI/UUNET and Global Crossing, as weh as other 
investigated cases, IP2Location and IPIigence located 
IP addresses far from Spotter's estimated location, it is 
likely that their geolocation information, in these cases, 
was wrong. 

Judging MaxMind performance has to be done care- 
fully, as they do not claim to have high accuracy for 
router interfaces. This is manifested in the high num- 
ber of returned NULL replies. MaxMind seems to have 
a lot in common with IPIigence and IP2Location, as the 
cross correlation shows, however unlike these databases 
MaxMind prefers to return NULL or country center re- 
ply when a location is unknown and thus returns less 
false locations. 

We find it hard to analyze Geobytes performance. 
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The database returns relatively a lot of NULL replies, 
which are significantly reduced by PoP level aggrega- 
tion. Further more, the granularity of the database is 
/24 CIDR, thus grouping every 256 address block to a 
single location. Though this is a common practice, it 
has some degradation effect. The oddity here is that 
there are many cases where the range of convergence 
is about 250km, which means the database located the 
IP addresses within each others' area, but did not con- 
sider them to be at the exact same city. However, as 
Geobytes indicate that they focus on the service area 
of a PoP rather then the location of that PoP, this can 
be expected. An advantage of Geobytes is that no as- 
signment to a single location or ISP headquarters were 
detected so far. 

The main drawback of HostlP.Info is the lack of in- 
formation. As most of the IP addresses and over a 
third of the PoPs did not have any location informa- 
tion, the knowledge gained on this database is limited. 
The limited location information led HostlP.info to per- 
form worst on almost all test cases. In addition, in more 
than a single case the location information indicated by 
the database was far off from any other database or 
measurement based location. 

Netacuity is probably the most expensive and highly 
claimed database used in this research. The results of 
the tests however may not stand up to expectations. 
Though one may assume that majority location is af- 
fected by errors in other databases, it can be expected 
that when compared to itself the performance will be 
high. The results show that for over 40% of the cases, 
same PoP IP addresses are not all located within 100km 
radius, which is in fact 200km diameter, and close to 
20% are not located within 500km radius either. The 
strength of Netacuity is that ISP IP addresses are rarely 
assigned to a single location, unless this is indeed a true 
single place. In addition, in the several anomalous cases 
that were investigated, Netacuity majority pointed to 
the most probable correct location. Note that a minor- 
ity of IP location votes still pointed to different loca- 
tions, even in different countries. 

5.1 Active Measurement Accuracy 

Active measurements are used by many geolocation 
services [JH [46j j35^ and by other projects for different 
localization tasks, most notably for assigning IP ad- 
dresses to PoPs [4r . Spotter geolocation is based solely 
on active measurements, thus we selected to study its 
performance to a greater depth due to the importance 
of understanding the limitations of this approach. 

Figures [T4l and [T5l show Spotter's overall performance 
compared with its performance for PoPs located only in 
Europe or in the USA. It is clear from both figures that 
in Europe Spotter perform much better than in the USA 
and slightly better than the world average. For exam- 
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Figure 14: Breakdown of the agreement CDF 
for Spotter by region. 
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Figure 15: Breakdown of the convergence range 
CDF for Spotter by region. 

pie, for 40km radius (which is frequently used as a city 
diameter) Spotter reach about 78% convergence in Eu- 
rope compared to 67% convergence worldwide, and only 
44% for the USA. The difference can be explainecO by 
the spread of vantage points used by Spotter, which are 
almost entirely based on PlanetLab nodes. While in Eu- 
rope PlanetLab nodes are well spread geographically, in 
the USA, most PlanetLab nodes are located along the 
coasts making localization of IP addresses in the middle 
of the USA less accurate. Interestingly, other databases 
which are based on other means also achieve better re- 
sults for European addresses than for USA addresses 
(see Fig. [HI). 

Spotter convergence (Fig.|4]) starts as the lowest which 
is an outcome of the measurement error that tend to 
spread the results for different IPs around the 'true' 
location. However, at a radius of 100km it closes the 
gap with most databases and reaches over 80% conver- 
gence (and close to 90% for Europe). However, 20% 
'error' may make distance measurements unfit as the 
sole method for assigning IP address to PoPs. 

6. CONCLUSION 

This paper presented a comprehensive study of geolo- 
cation databases, comparing a large number of databases 



*We consulted Peter Haga and Peter Matray from the Spot- 
ter project on this aspect. 
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Figure 16: CDF of Agreement Within Databases 
In Europe, 500kni Radius 

of different types. The results show that while some of 
the databases provide results that are well aggregated 
and have a small number of NULL replies, the accu- 
racy of the returned location can not be trusted. There 
is a strong correlation between all databases, which in- 
dicated that the vast majority of location information 
replies are correct. However, in some cases there are 
errors in the databases in the range of thousands of 
kilometers and countries apart. The use of geolocation 
database should therefore be careful and its information 
can not be considered as ground truth. 

Our results also show that measurement based ge- 
olocation can achieve good results that compete with 
geolocation information gathered by other means and 
that the achieved accuracy of geolocation using such 
tools can be fairly high. However, this accuracy may 
not be high enough to be used as a the sole tool to 
map IP addresses to PoPs. Future research in this field 
should focus on means to decide on ground truth when 
there is a disagreement between the databases. 

7. ACKNOWLEDGEMENTS 

We would like to thank Peter Matray and Peter Haga, 
as well as other members of the Spotter project [35] for 
providing us with their dataset and helping this work. 
We would also like to thank Frank Bobo from Digital 
Envoy, Adrian McElligott from Geobytes and Edward 
Lin from Maxmind for helping us in obtaining their 
databases and answering our questions. 

8. REFERENCES 

[1] IPInfoDB. http://ipinfodb.com, 2010. 
[2] Quova. http://www.quova.com, 2010. 
[3] Akamai. EdgePlatform. 

http : / /www. akamai.com /html /technology/ 

edgeplatformi.htm.1^ 2010. 
[4] ATT- Global- Services. Att global services global 

network map. 

http://www.corp.att.com/globalnetworking/ 



media/ network_map.swf. 
[5] BT-Global-Services. Network maps. 

http://www.bt.net/info/europe.shtml. 
[6] Digital Envoy. NetAcuity Edge. 

http : / /www. digital — 

elem.ent.com. /our -technology /edge. htm.1^ 2010. 
[7] B. Eriksson, P. Barford, J. Sommers, and 

R. Nowak. A learning-based approach for ip 

geolocation. In Passive and Active Measurement^ 

pages 171-180, 2010. 
[8] D. Feldman and Y. Shavitt. Automatic large scale 

generation of internet PoP level maps. In 

GLOBECOM, pages 2426-2431, 2008. 
[9] Geobytes. GeoNetMap. 

https : / /secure. geobytes. com /GeoNetMap. htm, 

2010. 
[10] Global-Crossing. Global Crossing Network. 

http://www.globalcrossing.com/html/ 

map062408.html. 
[11] G. Goodell and P. Syverson. The right place at 

the right time. Commun. ACM, 50(5):113-117, 

2007. 
[12] Google. Geolocation API. 

http : // code.google.com/ apis /gears/ 

api_geolocation.html , 2010. 
[13] Google Gears Wiki. Geolocation API. http : 

// code.google.com/ p / gear s / wiki / Geolocation API , 

2010. 
[14] B. R. Greene and P. Smith. Cisco ISP Essentials. 

Cisco Press, 2002. 
[15] B. Gueye, S. Uhlig, and S. Fdida. Investigating 

the imprecision of ip block-based geolocation. In 

PAM^07: Proceedings of the 8th international 

conference on Passive and active network 

measurement, pages 237-240, 2007. 
[16] B. Gueye, A. Ziviani, M. Crovella, and S. Fdida. 

Constraint-based geolocation of internet hosts. 

IEEE/ACM Trans. Netw., 14(6):1219-1232, 2006. 
[17] Hexsoft Development. IP2Location. 

http://www.ip21ocation.com, 2010. 
[18] hostip.Info. hostip.info. http://www.hostip.info, 

2010. 
[19] IETF. Geopriv workgroup. 

http://tools.ietf.org/wg/geopriv/. 
[20] G. Inc. Akamai ip location service performance 

assessment, akamOl, September 2009. 
[21] IPligence. IPligence Max. 

http://www.ipligence.com, 2010. 
[22] J. M. Jensen. Personal jurisdiction in federal 

courts over international e-commerce cases. 

Loyola of Los Angeles Law Review, 1507, 2006. 
[23] E. Kaiser and W.-c. Feng. Helping ticketmaster: 

Changing the economics of ticket robots with 

geographic proof-of-work. In Proceedings of Global 

Internet 2010, March 2010. 



13 



[24] E. Katz-Bassett, J. P. John, A. Krishnamurthy, 
D. Wetherall, T. Anderson, and Y. Chawathe. 
Towards IP geolocation using delay and topology 
measurements. In The 6th ACM SIGCOMM 
conference on Internet measurement (IMC ^06)^ 
pages 71-84, 2006. 

[25] K. F. King. Geolocation and federalism on the 
internet: Cutting internet gamblings gordian 
knot. Columbia Science and Technology Law 
Review^ Forthcoming, 2009. 

[26] P. C. LLP. Quova - report of independant 
accountants. 

http : // www .quova.com / documents / 
PricewaterhouseCoopers-Audit.pdf, October 
2008. 

[27] S. Malphrus. Perspectives on retail payments 
fraud. Economic Perspectives, XXXIII(l). 

[28] MaxMind LLC. GeoIP City Accuracy for Selected 
Countries, http : / /www.m.axm.ind.com./ 
app/ city -accuracy, 2008. 

[29] MaxMind LLC. GeoIP. 

http://www.maxmind.com, 2010. 

[30] A. E. McElligott. Method and software product 
for identifying network devices having a common 
geographical locale. May 2007. 

[31] J. A. Muir and P. C. V. Oorschot. Internet 

geolocation: Evasion and counterevasion. ACM 
Comput Surv., 42(l):l-23, 2009. 

[32] V. N. Padmanabhan and L. Subramanian. An 
investigation of geographic mapping techniques 
for internet hosts. In SIGCOMM ^01: Proceedings 
of the 2001 conference on Applications, 
technologies, architectures, and protocols for 
computer communications, pages 173-185, 2001. 

[33] Qwest. IP Network Statistics. 

http://66.77.32.148/indexJlash.html. 

[34] P. H. I. C. G. V. S. Laki, P. Matray. A model 
based approach for improving router geolocation. 
Computer Networks, 54(9): 1490-1501, 2010. 

[35] S. Laki, P. Matray, P. Haga, T. Sebok, I. Csabai, 
G. Vattay. Spotter: A model based active 
geolocation tool, (to be published), 2010. 

[36] A. Sardella. Building next-gen points of presence, 
cost-effective pop consolidation with juniper 
routers. White paper. Juniper Networks, June 
2006. 

[37] H. Security. A roadmap for cyber security 
research. 

http://www.cyber.st.dhs.gov/docs/DHS- 
Cybersecurity-Roadmap.pdf, November 
2009. 

[38] Y. Shavitt and E. Shir. DIMES: Let the internet 
measure itself. In ACM SIGCOMM Computer 
Communication Review, volume 35, Oct. 2005. 

[39] Y. Shavitt and N. Zilberman. A structural 



approach for pop geo-location. In Infocom 

Workshop on Network Science for 

Communications (NetSciCom), March 2010. 
[40] S. S. Siwpersad, B. Gueye, and S. Uhlig. 

Assessing the geographic resolution of exhaustive 

tabulation for geolocating internet hosts. In 

Passive and Active Measurement, volume 4979, 

pages 11-20, 2008. 
[41] N. Spring, R. Mahajan, and D. Wetherah. 

Measuring isp topologies with rocketfuel. In In 

Proc. ACM SIGCOMM, pages 133-145, 2002. 
[42] Sprint. Global IP Network. 

https://www.sprint.net/network_maps.php. 
[43] D. Svantesson. E-commerce tax: How the taxman 

brought geography to the borderless internet. 

Revenue Law Journal, 17.1, 2007. 
[44] D. Svantesson. How does the accuracy of 

geo-location technologies affect the law? Law 

papers, 2008. 
[45] M. Thomson and J. Winterbottom. Discovering 

the local location information server (lis). 

http://tools.ietf.org/html/draft-ietf-geopriv-lis- 

discovery-15, March 

2010. 
[46] B. Wong, I. Stoyanov, and E. G. Sirer. Octant: A 

comprehensive framework for the geolocalization 

of internet hosts. In NSDI, 2007. 
[47] K. Yoshida, Y. Kikuchi, M. Yamamoto, Y. Fujii, 

K. Nagami, I. Nakagawa, and H. Esaki. Inferring 

pop-level isp topology through end-to-end delay 

measurement. In PAM, volume 5448, pages 35-44, 

2009. 
[48] M. Zhang, Y. Kuan, V. Pai, and J. Rexford. How 

dns misnaming distorts internet topology 

mapping. In ATEC ^06: Proceedings of the annual 

conference on USENIX '06 Annual Technical 

Conference, pages 34-34, 2006. 



14 



